Overview

Dataset Statistics

Number of Variables 24
Number of Rows 2.1772e+06
Missing Cells 2.0422e+06
Missing Cells (%) 3.9%
Duplicate Rows 0
Duplicate Rows (%) 0.0%
Total Size in Memory 2.6 GB
Average Row Size in Memory 1.3 KB
Variable Types
  • Categorical: 18
  • Numerical: 6

Dataset Insights

Age_of_Vehicle has 358149 (16.45%) missing values Missing
Driver_IMD_Decile has 734812 (33.75%) missing values Missing
Engine_Capacity_.CC. has 265861 (12.21%) missing values Missing
make has 110845 (5.09%) missing values Missing
model has 325331 (14.94%) missing values Missing
Propulsion_Code has 245843 (11.29%) missing values Missing
Age_of_Vehicle is skewed Skewed
Engine_Capacity_.CC. is skewed Skewed
Vehicle_Location.Restricted_Lane is skewed Skewed
Vehicle_Reference is skewed Skewed
Accident_Index has a high cardinality: 1488981 distinct values High Cardinality
make has a high cardinality: 535 distinct values High Cardinality
model has a high cardinality: 35723 distinct values High Cardinality
Accident_Index has constant length 13 Constant Length
Vehicle_Location.Restricted_Lane has 2135969 (98.11%) zeros Zeros
  • 1
  • 2

Variables


Accident_Index

categorical

Approximate Distinct Count 1488981
Approximate Unique (%) 68.4%
Missing 0
Missing (%) 0.0%
Memory Size 162.0 MB

Length

Mean 13
Standard Deviation 0
Median 13
Minimum 13
Maximum 13

Sample

1st row 200401BS00001
2nd row 200401BS00002
3rd row 200401BS00003
4th row 200401BS00003
5th row 200401BS00004

Letter

Count 1993730
Lowercase Letter 0
Space Separator 0
Uppercase Letter 1993730
Dash Punctuation 0
Decimal Number 26309935
  • Accident_Index has words of constant length

Age_Band_of_Driver

categorical

Approximate Distinct Count 12
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 152.9 MB

Length

Mean 8.6493
Standard Deviation 5.6504
Median 7
Minimum 5
Maximum 28

Sample

1st row 26 - 35
2nd row 26 - 35
3rd row 26 - 35
4th row 66 - 75
5th row 26 - 35

Letter

Count 4151140
Lowercase Letter 3925852
Space Separator 4813330
Uppercase Letter 225288
Dash Punctuation 1951917
Decimal Number 7915006

Age_of_Vehicle

numerical

Approximate Distinct Count 88
Approximate Unique (%) 0.0%
Missing 358149
Missing (%) 16.4%
Infinite 0
Infinite (%) 0.0%
Memory Size 27.8 MB
Mean 7.1082
Minimum 1
Maximum 111
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • Age_of_Vehicle is skewed right (γ1 = 1.223)

Quantile Statistics

Minimum 1
5-th Percentile 1
Q1 3
Median 7
Q3 10
95-th Percentile 15
Maximum 111
Range 110
IQR 7

Descriptive Statistics

Mean 7.1082
Standard Deviation 4.7259
Variance 22.334
Sum 1.293e+07
Skewness 1.223
Kurtosis 5.716
Coefficient of Variation 0.6649
  • Age_of_Vehicle is not normally distributed (p-value 3.1531491622307152e-12)
  • Age_of_Vehicle has 15587 outliers

Driver_Home_Area_Type

categorical

Approximate Distinct Count 4
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 160.4 MB
  • The largest value (Urban area) is over 4.3 times larger than the second largest value (Data missing or out of range)

Length

Mean 12.2306
Standard Deviation 6.8883
Median 10
Minimum 5
Maximum 28

Sample

1st row Urban area
2nd row Urban area
3rd row Data missing or ou...
4th row Data missing or ou...
5th row Urban area

Letter

Count 23346221
Lowercase Letter 21169016
Space Separator 3282221
Uppercase Letter 2177205
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (Urban area, Data missing or out of range) take over 50.0%

Driver_IMD_Decile

numerical

Approximate Distinct Count 10
Approximate Unique (%) 0.0%
Missing 734812
Missing (%) 33.8%
Infinite 0
Infinite (%) 0.0%
Memory Size 22.0 MB
Mean 5.3876
Minimum 1
Maximum 10
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%

Quantile Statistics

Minimum 1
5-th Percentile 1
Q1 3
Median 5
Q3 8
95-th Percentile 10
Maximum 10
Range 9
IQR 5

Descriptive Statistics

Mean 5.3876
Standard Deviation 2.8217
Variance 7.9617
Sum 7.771e+06
Skewness nan
Kurtosis nan
Coefficient of Variation 0.5237
  • Driver_IMD_Decile is not normally distributed (p-value 0.00038454239325098416)

Engine_Capacity_.CC.

numerical

Approximate Distinct Count 2556
Approximate Unique (%) 0.1%
Missing 265861
Missing (%) 12.2%
Infinite 0
Infinite (%) 0.0%
Memory Size 29.2 MB
Mean 2042.234
Minimum 1
Maximum 96000
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • Engine_Capacity_.CC. is skewed right (γ1 = 5.5374)

Quantile Statistics

Minimum 1
5-th Percentile 599
Q1 1299
Median 1598
Q3 1997
95-th Percentile 6370
Maximum 96000
Range 95999
IQR 698

Descriptive Statistics

Mean 2042.234
Standard Deviation 1950.1432
Variance 3.8031e+06
Sum 3.9034e+09
Skewness 5.5374
Kurtosis 97.0229
Coefficient of Variation 0.9549
  • Engine_Capacity_.CC. is not normally distributed (p-value 3.656466476362716e-22)
  • Engine_Capacity_.CC. has 228396 outliers

Hit_Object_in_Carriageway

categorical

Approximate Distinct Count 13
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 143.9 MB
  • The largest value (None) is over 60.77 times larger than the second largest value (Kerb)

Length

Mean 4.31
Standard Deviation 2.0941
Median 4
Minimum 4
Maximum 32

Sample

1st row None
2nd row None
3rd row None
4th row None
5th row None

Letter

Count 9288123
Lowercase Letter 7110918
Space Separator 86483
Uppercase Letter 2177205
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (None, Kerb) take over 50.0%
  • The largest value (none) is over 60.77 times larger than the second largest value (kerb)

Hit_Object_off_Carriageway

categorical

Approximate Distinct Count 13
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 145.7 MB
  • The largest value (None) is over 35.13 times larger than the second largest value (Other permanent object)

Length

Mean 5.1806
Standard Deviation 4.5023
Median 4
Minimum 4
Maximum 29

Sample

1st row None
2nd row None
3rd row None
4th row None
5th row None

Letter

Count 10935481
Lowercase Letter 8742441
Space Separator 327827
Uppercase Letter 2193040
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (None, Other permanent object) take over 50.0%
  • The largest value (none) is over 35.13 times larger than the second largest value (other)

Journey_Purpose_of_Driver

categorical

Approximate Distinct Count 8
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 172.6 MB

Length

Mean 18.1327
Standard Deviation 7.6936
Median 27
Minimum 5
Maximum 28

Sample

1st row Data missing or ou...
2nd row Data missing or ou...
3rd row Data missing or ou...
4th row Data missing or ou...
5th row Data missing or ou...

Letter

Count 29028743
Lowercase Letter 26295812
Space Separator 4664050
Uppercase Letter 2732931
Dash Punctuation 555726
Decimal Number 3334356
  • The top 2 categories (Not known, Other/Not known (2005-10)) take over 50.0%
  • The largest value (known) is over 1.67 times larger than the second largest value (not)

Junction_Location

categorical

Approximate Distinct Count 10
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 221.7 MB
  • The largest value (Not at or within 20 metres of junction) is over 1.69 times larger than the second largest value (Approaching junction or waiting/parked at junction approach)

Length

Mean 41.7691
Standard Deviation 12.0784
Median 44
Minimum 17
Maximum 59

Sample

1st row Data missing or ou...
2nd row Data missing or ou...
3rd row Data missing or ou...
4th row Data missing or ou...
5th row Data missing or ou...

Letter

Count 74719015
Lowercase Letter 72118506
Space Separator 13590473
Uppercase Letter 2600509
Dash Punctuation 423304
Decimal Number 1616216
  • The top 2 categories (Not at or within 20 metres of junction, Approaching junction or waiting/parked at junction approach) take over 50.0%
  • The largest value (junction) is over 2.99 times larger than the second largest value (20)

make

categorical

Approximate Distinct Count 535
Approximate Unique (%) 0.0%
Missing 110845
Missing (%) 5.1%
Memory Size 140.6 MB

Length

Mean 6.3611
Standard Deviation 2.2599
Median 6
Minimum 2
Maximum 26

Sample

1st row ROVER
2nd row BMW
3rd row NISSAN
4th row LONDON TAXIS INT
5th row PIAGGIO

Letter

Count 13043908
Lowercase Letter 0
Space Separator 88049
Uppercase Letter 13043908
Dash Punctuation 8537
Decimal Number 0

model

categorical

Approximate Distinct Count 35723
Approximate Unique (%) 1.9%
Missing 325331
Missing (%) 14.9%
Memory Size 141.0 MB

Length

Mean 14.8426
Standard Deviation 5.6769
Median 15
Minimum 1
Maximum 35

Sample

1st row 45 CLASSIC 16V
2nd row C1
3rd row MICRA CELEBRATION ...
4th row TXII GOLD AUTO
5th row VESPA ET4

Letter

Count 20223682
Lowercase Letter 0
Space Separator 3855404
Uppercase Letter 20223682
Dash Punctuation 220130
Decimal Number 3042038

Propulsion_Code

categorical

Approximate Distinct Count 12
Approximate Unique (%) 0.0%
Missing 245843
Missing (%) 11.3%
Memory Size 133.1 MB

Length

Mean 7.2567
Standard Deviation 1.5766
Median 6
Minimum 3
Maximum 19

Sample

1st row Petrol
2nd row Petrol
3rd row Petrol
4th row Petrol
5th row Heavy oil

Letter

Count 13224113
Lowercase Letter 11288446
Space Separator 785838
Uppercase Letter 1935667
Dash Punctuation 1642
Decimal Number 0
  • The top 2 categories (Petrol, Heavy oil) take over 50.0%

Sex_of_Driver

categorical

Approximate Distinct Count 4
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 144.8 MB
  • The largest value (Male) is over 2.32 times larger than the second largest value (Female)

Length

Mean 4.7569
Standard Deviation 1.2171
Median 4
Minimum 4
Maximum 28

Sample

1st row Male
2nd row Male
3rd row Male
4th row Male
5th row Male

Letter

Count 10280326
Lowercase Letter 8103121
Space Separator 76391
Uppercase Letter 2177205
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (Male, Female) take over 50.0%
  • The largest value (male) is over 2.32 times larger than the second largest value (female)

Skidding_and_Overturning

categorical

Approximate Distinct Count 7
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 144.8 MB
  • The largest value (None) is over 9.5 times larger than the second largest value (Skidded)

Length

Mean 4.7393
Standard Deviation 2.7479
Median 4
Minimum 4
Maximum 28

Sample

1st row None
2nd row None
3rd row None
4th row None
5th row None

Letter

Count 10227595
Lowercase Letter 8050390
Space Separator 90935
Uppercase Letter 2177205
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (None, Skidded) take over 50.0%
  • The largest value (none) is over 7.84 times larger than the second largest value (skidded)

Towing_and_Articulation

categorical

Approximate Distinct Count 7
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 174.3 MB
  • The largest value (No tow/articulation) is over 79.33 times larger than the second largest value (Articulated vehicle)

Length

Mean 18.9696
Standard Deviation 0.5798
Median 19
Minimum 7
Maximum 28

Sample

1st row No tow/articulatio...
2nd row No tow/articulatio...
3rd row No tow/articulatio...
4th row No tow/articulatio...
5th row No tow/articulatio...

Letter

Count 36982809
Lowercase Letter 34805604
Space Separator 2181609
Uppercase Letter 2177205
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (No tow/articulation, Articulated vehicle) take over 50.0%

Vehicle_Leaving_Carriageway

categorical

Approximate Distinct Count 10
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 183.7 MB
  • The largest value (Did not leave carriageway) is over 14.66 times larger than the second largest value (Nearside)

Length

Mean 23.4767
Standard Deviation 5.1096
Median 25
Minimum 7
Maximum 37

Sample

1st row Did not leave carr...
2nd row Did not leave carr...
3rd row Did not leave carr...
4th row Did not leave carr...
5th row Did not leave carr...

Letter

Count 45154670
Lowercase Letter 42977465
Space Separator 5949112
Uppercase Letter 2177205
Dash Punctuation 2151
Decimal Number 0
  • The top 2 categories (Did not leave carriageway, Nearside) take over 50.0%

Vehicle_Location.Restricted_Lane

numerical

Approximate Distinct Count 10
Approximate Unique (%) 0.0%
Missing 1317
Missing (%) 0.1%
Infinite 0
Infinite (%) 0.0%
Memory Size 33.2 MB
Mean 0.1073
Minimum 0
Maximum 9
Zeros 2135969
Zeros (%) 98.1%
Negatives 0
Negatives (%) 0.0%
  • Vehicle_Location.Restricted_Lane is skewed right (γ1 = 8.9186)

Quantile Statistics

Minimum 0
5-th Percentile 0
Q1 0
Median 0
Q3 0
95-th Percentile 0
Maximum 9
Range 9
IQR 0

Descriptive Statistics

Mean 0.1073
Standard Deviation 0.88
Variance 0.7743
Sum 233482
Skewness 8.9186
Kurtosis 81.1574
Coefficient of Variation 8.2006
  • Vehicle_Location.Restricted_Lane is not normally distributed (p-value 4.242903659070878e-25)
  • Vehicle_Location.Restricted_Lane has 39919 outliers

Vehicle_Manoeuvre

categorical

Approximate Distinct Count 19
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 171.6 MB
  • The largest value (Going ahead other) is over 4.6 times larger than the second largest value (Turning right)

Length

Mean 17.6222
Standard Deviation 5.6592
Median 17
Minimum 6
Maximum 35

Sample

1st row Going ahead other
2nd row Going ahead other
3rd row Turning right
4th row Going ahead other
5th row Going ahead other

Letter

Count 33366165
Lowercase Letter 31188960
Space Separator 4577157
Uppercase Letter 2177205
Dash Punctuation 423926
Decimal Number 0
  • The top 2 categories (Going ahead other, Turning right) take over 50.0%

Vehicle_Reference

numerical

Approximate Distinct Count 63
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 33.2 MB
Mean 1.5534
Minimum 1
Maximum 91
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • Vehicle_Reference is skewed right (γ1 = 6.973)

Quantile Statistics

Minimum 1
5-th Percentile 1
Q1 1
Median 1
Q3 2
95-th Percentile 3
Maximum 91
Range 90
IQR 1

Descriptive Statistics

Mean 1.5534
Standard Deviation 0.7752
Variance 0.601
Sum 3.3821e+06
Skewness 6.973
Kurtosis 354.4992
Coefficient of Variation 0.4991
  • Vehicle_Reference is not normally distributed (p-value 5.777418515514431e-25)
  • Vehicle_Reference has 40337 outliers

Vehicle_Type

categorical

Approximate Distinct Count 24
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 154.3 MB
  • The largest value (Car) is over 13.02 times larger than the second largest value (Van / Goods 3.5 tonnes mgw or under)

Length

Mean 9.3151
Standard Deviation 11.4694
Median 3
Minimum 3
Maximum 37

Sample

1st row 109
2nd row 109
3rd row 109
4th row 109
5th row Motorcycle 125cc a...

Letter

Count 15846379
Lowercase Letter 13599788
Space Separator 2436861
Uppercase Letter 2246591
Dash Punctuation 8517
Decimal Number 1435136
  • The top 2 categories (Car, Van / Goods 3.5 tonnes mgw or under) take over 50.0%
  • The largest value (car) is over 8.15 times larger than the second largest value (goods)

Was_Vehicle_Left_Hand_Drive

categorical

Approximate Distinct Count 3
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 142.3 MB
  • The largest value (No) is over 15.99 times larger than the second largest value (Data missing or out of range)

Length

Mean 3.5297
Standard Deviation 6.1145
Median 3
Minimum 2
Maximum 28

Sample

1st row Data missing or ou...
2nd row Data missing or ou...
3rd row Data missing or ou...
4th row Data missing or ou...
5th row Data missing or ou...

Letter

Count 7045100
Lowercase Letter 4867895
Space Separator 639715
Uppercase Letter 2177205
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (No, Data missing or out of range) take over 50.0%
  • The largest value (no) is over 15.99 times larger than the second largest value (data)

X1st_Point_of_Impact

categorical

Approximate Distinct Count 6
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 147.4 MB
  • The largest value (Front) is over 2.58 times larger than the second largest value (Back)

Length

Mean 6.0121
Standard Deviation 2.4811
Median 5
Minimum 4
Maximum 28

Sample

1st row Front
2nd row Front
3rd row Front
4th row Front
5th row Front

Letter

Count 12827027
Lowercase Letter 10649822
Space Separator 262527
Uppercase Letter 2177205
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (Front, Back) take over 50.0%
  • The largest value (front) is over 2.58 times larger than the second largest value (back)

Year

numerical

Approximate Distinct Count 13
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 33.2 MB
Mean 2010.9341
Minimum 2004
Maximum 2016
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • Year is skewed left (γ1 = -0.292)

Quantile Statistics

Minimum 2004
5-th Percentile 2004
Q1 2008
Median 2011
Q3 2014
95-th Percentile 2016
Maximum 2016
Range 12
IQR 6

Descriptive Statistics

Mean 2010.9341
Standard Deviation 3.6944
Variance 13.6484
Sum 4.3782e+09
Skewness -0.292
Kurtosis -1.0755
Coefficient of Variation 0.001837
  • Year is not normally distributed (p-value 0.0004910449682235168)

Interactions

Correlations

Missing Values